Skip to content

Guard init startup against OpenTelemetry boot-loop conditions#3423

Merged
gantoine merged 5 commits into
masterfrom
copilot/fix-boot-loop-opentelemetry
May 24, 2026
Merged

Guard init startup against OpenTelemetry boot-loop conditions#3423
gantoine merged 5 commits into
masterfrom
copilot/fix-boot-loop-opentelemetry

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 24, 2026

RomM could enter a restart loop during container boot when OpenTelemetry wrapping was applied to startup/Gunicorn under incompatible OTEL env/runtime conditions. This change makes OTEL instrumentation conditional so the service still boots cleanly when OTEL is disabled or unavailable.

  • Startup path hardening (run_startup)

    • Execute startup.py directly when OTEL_SDK_DISABLED=true.
    • Fallback to direct execution if opentelemetry-instrument is not present.
    • Keep existing instrumented path when OTEL is enabled and wrapper is available.
  • Gunicorn launch hardening (start_bin_gunicorn)

    • Apply the same conditional behavior to backend process launch.
    • Preserve current OTEL-instrumented behavior in healthy OTEL configurations.
    • Add warning logs when falling back to non-instrumented execution.
  • Behavioral impact

    • Prevents repeated “Starting backend” loops caused by OTEL wrapper path failures.
    • Maintains compatibility with both OTEL-enabled and OTEL-disabled deployments.
if [[ ${OTEL_SDK_DISABLED:-false} == "true" ]]; then
  gunicorn ... main:app &
elif command -v opentelemetry-instrument >/dev/null 2>&1; then
  opentelemetry-instrument --service_name "${OTEL_SERVICE_NAME_PREFIX-}api" gunicorn ... main:app &
else
  warn_log "opentelemetry-instrument not found, starting gunicorn without OpenTelemetry instrumentation"
  gunicorn ... main:app &
fi

Copilot AI linked an issue May 24, 2026 that may be closed by this pull request
Co-authored-by: gantoine <3247106+gantoine@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix boot loop due to opentelemetry error Guard init startup against OpenTelemetry boot-loop conditions May 24, 2026
Copilot AI requested a review from gantoine May 24, 2026 14:15
Collapse the duplicated OTEL_SDK_DISABLED / opentelemetry-instrument
branches in run_startup, start_bin_gunicorn, start_bin_watcher, and
start_bin_sync_watcher into two small helpers:

- otel_prefix: emits the wrapper as NUL-delimited argv tokens (for
  direct process invocation).
- otel_prefix_str: emits the wrapper as a shell-string prefix (for
  embedding inside `watchfiles --target-type command`).

Each call site becomes a single command instead of a 2- or 3-way
branch with a fully duplicated command body. As a side effect, the
watcher functions now also gain the `command -v opentelemetry-instrument`
fallback that the gunicorn/startup paths added.
@gantoine gantoine marked this pull request as ready for review May 24, 2026 18:22
Copilot AI review requested due to automatic review settings May 24, 2026 18:22
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the container init script so OpenTelemetry instrumentation is only applied when enabled and available, avoiding startup loops when the wrapper cannot be used.

Changes:

  • Adds helper functions to generate OpenTelemetry wrapper prefixes.
  • Uses those helpers for startup, Gunicorn, watcher, and sync watcher launch paths.
  • Falls back to direct execution when OTEL is disabled or the wrapper is unavailable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docker/init_scripts/init Outdated
gantoine and others added 2 commits May 24, 2026 14:57
Collapse `otel_prefix` and `otel_prefix_str` into a single nameref-based
helper. Watchfiles call sites embed the array as a shell-quoted prefix
via `${wrap[*]@q}`, which also fixes a quoting bug where an
`OTEL_SERVICE_NAME_PREFIX` containing a single quote would produce an
invalid command string and break the watcher.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All callers declare a fresh `local -a wrap=()` before invoking, so the
in-function reset is unnecessary.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@gantoine gantoine merged commit 7912769 into master May 24, 2026
9 checks passed
@gantoine gantoine deleted the copilot/fix-boot-loop-opentelemetry branch May 24, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Boot loop due to opentelemetry

4 participants